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Abstract 

Consider a hidden Markov chain obtained as the observation process of an ordinary 
Markov chain corrupted by noise. Zuk, et. al. |131 114j showed how, in principle, one 
can explicitly compute the derivatives of the entropy rate of at extreme values of the 
noise. Namely, they showed that the derivatives of standard upper approximations to 
the entropy rate actually stabilize at an explicit finite time. We generalize this result 
to a natural class of hidden Markov chains called "Black Holes." We also discuss in 
depth special cases of binary Markov chains observed in binary symmetric noise, and 
give an abstract formula for the first derivative in terms of a measure on the simplex 
due to Blackwell. 



1 Introduction 

As in |2], let Y = {Y^} be a stationary Markov chain with a finite state alphabet 
{1, 2, • • • , B}. A function Z = {Z^^} of the Markov chain Y with the form Z = ^(Y) 
is called a hidden Markov chain; here $ is a finite valued function defined on {1, 2, ■ ■ ■ , B}, 
taking values in {1, 2, ■ ■ ■ , A}. Let A denote the probability transition matrix for Y; it is well 
known that the entropy rate H{Y) of Y can be analytically expressed using the stationary 
vector of Y and A. Let W be the simplex, comprising the vectors 

{w = {wi, W2, ■ ■ ■ , Wb) eR^ : Wi>0,^Wi = 1}, 

i 

and let Wa be all w e W with Wi = for 7^ a. For a e A, let A^ denote the B x B 
matrix such that Aa{i,j) = A(z,j) for j with = a, and Aa{i,j) = otherwise. For 
a E A, define the scalar-valued and vector-valued functions and fa on W by 



and 

fa{w) = wAa/ra{w). 

Note that fa defines the action of the matrix on the simplex W. 
If Y is irreducible, it turns out that 

H{Z) = - I ^r,Hlogr,HdQH, (1.1) 

where Q is BlackwelVs measure P on W . This measure is defined as the limiting distribution 

Recently there has been a great deal of work on the entropy rate of a hidden Markov 
chain |H1 El 13 113 II d • See also closely related work [3 [H Hg . 

In Section 13 we establish a "stabilizing" property for the derivatives of the entropy rate 
in a family we call "Black Holes". Using this property, one can, in principle, explicitly 
calculate the derivatives of the entropy rate for this case. 

In Section 13 we consider binary Markov chains corrupted by binary symmetric noise. For 
this class, we obtain results on the support of Blackwell's measure, and for a special case, 
that we call the "non-overlapping" case, we express the first derivative of the entropy rate as 
the sum of terms, involving Blackwell's measure, which have meaningful interpretations. We 
also show how this expression relates to earlier examples, given in PJ, of non-smoothness on 
the boundary for this class of hidden Markov chains, and we compute the second derivative 
in an important special case. 



2 Stabilizing Property of Derivatives in Black Hole 

Case 

Suppose that for every a & A, A^ is a rank one matrix, and every column of A^ is either 
strictly positive or all zeros. In this case, the image of is a single point and each is 
defined on the whole simplex W. Thus we call this the Black Hole case. Analyticity of the 
entropy rate at a Black Hole follows from Theorem 1.1 of [2 . 

In this section we show that, in principle, the coefficients of a Taylor series expansion, 
centered at a Black Hole, can be explicitly computed. This result was motivated by and 
generalizes earlier work by Zuk, et. al. ^3 E] and Ordentlich-Weissman |10j on cases of 
hidden Markov chains obtained by passing a Markov chain through special kinds of channels. 
All of the hidden Markov chains considered in jT3 E] are Black Holes. 

As an example, consider a hidden Markov chain obtained from a binary Markov chain 
corrupted by binary symmetric noise with crossover probability e (described in Example 4.1 
of ^). When e = 0, 



A = 



TTOO 





71"01 





TTOO 
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TTll 
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here, tti/s are the Markov transition probabilities, and $ maps states 1 and 4 to and maps 
states 2 and 3 to 1. In this case, the nonzero entries of Aq and Ai are restricted to a single 
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column and so both Aq and Ai have rank one. If vTy's are all positive, then this is a Black 
Hole case. 

Suppose that A is analytically parameterized by a vector variable e = {ei,e2, ■ ■ ■ ,£m)- 
Recall that Hn{Z) is defined as 

Hr.{Z)=H{Z,\Zzl). 

The following theorem says that at a Black Hole, one can calculate the derivatives of H{Z) 
by taking the derivatives of Hn{Z) for large enough n. 

Theorem 2.1. If at e = e, for every a & A, A^ is a rank one matrix, and every column of 
Aa is either a positive or a zero column, then 



Qai+a2^ ha,nJJ 



Ql+a2H hctm 



In fact, we give a stronger result. Theorem 12.61 later in this section. 

Proof. For simplicity we assume that A is only parameterized by one real variable e, and we 
drop e when the implication is clear from the context. 

We shall first prove that for all sequences z^_^ the n-ih. derivative of p(-2o|-2-D stabilizes: 



(2.2) 



Since p{zq\z_]^) = p{y-i = ■ \z_l^)Azgl (here ■ represent the states of the Markov chain 
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Y), it suffices to prove that for the n-th derivative of Xj = p{yi = ■ \z'^_^), we have 

xS"^ = P^''\y^ = ■ k-oo) = p^''\y^ = ■ \A-n) at e = (2.3) 

Consider the iteration: 

a^i-iA,. 



Xi 



In other words, Xi can be viewed as a function of Xi^i and A^. Let g denote this function. 
Since at e = A^^ is a rank one matrix, we conclude that g is constant as a function of 
Thus aX, e = e 



Xi = p{yi = ■ kLoo) 



Xi_iA^ 



PiVi- 



■)A, 



Xi_iA^T p(?/i_i = ■ )A^T 
Taking the derivative of g with respect to e, we have at e = i 



PiVi 



(2.4) 



x' 



dg 



dA, 



[Xi 



dg 



dxi-.i 



(xj_i, A^.) 



Since at e = e, g is a constant as a function of we have 



dg 
dxi-i 



(a^i-i, A, 



d{a constant vector) 
dxi-i 



It then follows from ()2.4|) that a.t e = e 

x'i = p'iVi = ■ k-oo) = P'iVi = ■ 
Taking higher order derivatives, we have 

Az-) x["^i + other terms, 



(n) dg 
x) - 



where "other terms" involve only lower order (than n) derivatives of Xi-i. By induction, we 
conclude that 

aX e = e. We then have ()2.3p and therefore ()2.2p as desired. 

By the proof of Theorem 1.1 of P/, the complexified Hn{Z) uniformly converges to the 
complexified H{Z), and so we can switch the limit operation and the derivative operation. 
Thus, at all e, 

H'{Z) = ( hm J2iPi^-k)^ogpizo\zZl)y 
- lim y2(p'(^-k) logp(^okZfc) (^ol^-fcK 

k — ^nn f * 



Since 



we have for all e 



^0 V\M^-k) ^„ 



iJ'(Z) = hm logp(;.ok:D)- (2.5) 

At £ = £, we obtain: 

H\Z) = hm V(y(£,)logp(;.ok-i)) 



k—'OO ' 



= J2{p'{z^^)\ogp{zo\z-i)) = H[iZ). 

For higher order derivatives, again using the fact that we can interchange the order of limit 
and derivative operations and using (|2.5p and Leibnitz formula, we have for all e 



H(-\Z) = hm J]X^Ci-\/)(z°,)(logp(zok:D) 



(n~l) 



fc— K30 



(the use of ()2.5|1 accounts for the fact that there is no / = term in this expression). Note 
that the term (logp(2;o|-2Z^))^"~''* involves only the lower order (less than or equal to — 1) 
derivatives of p{zo\zZl), which are already "stabilizing" in the sense of ()2.2|) : so, we have 



H^-\Z) = hm 5^5^C7ty')(,o^,)(iogp(,„|,-i)) 



in-l) 



^0. l=\ 



-\p^'\z'_^)i\ogp{z,\zZl)f'~'^ 



We thus prove the theorem. 



□ 



Remark 2.2. It follows from ()2.4|) that a hidden Markov chain at a Black Hole is, in fact, 
a Markov chain. Note that in the argument above the proof of the stabilizing property of 
the first derivative (as opposed to higher derivatives) requires only that the hidden Markov 
chain is Markov and that we can interchange the order of limit and derivative operations 
(instead of the stronger Black Hole property). Therefore if a hidden Markov chain Z defined 
by A and $ is in fact a Markov chain, and the complexified Hn{Z) uniformly converges to 
H{Z) on some neighborhood of A (e.g., if the conditions of Theorem 1.1, 6.1 or 7.5 of j2] 
hold), then at A, we have 

H'{Z) = H[{Z). (2.6) 
For instance, consider the following hidden Markov chain Z defined by 



A 



1/4 1/4/ 1/2 
1/6 5/6 
7/8 1/8 



with $(1) = and $(1) = $(2) = 1. Z is in fact a Markov chain (see page 134 in jS]), and 
one checks that A satisfies the conditions in Theorem 7.5 in Pj. We conclude that for this 
example, ()2.6p holds. 



In the cases studied in [121 CH UHl , the authors obtained, using a finer analysis, a shorter 
"stabilizing length." This shorter length can be derived for the Black Hole well, as 

shown in Theorem 12.61 below, even though the proof in doesn't seem to work. 

We need some preliminary lemmas for the proof of Theorem 12.61 

By induction, one can prove that the formal derivative of y log y takes the following form: 
(ylogyr^ = J2 ^K.a.,....a^, / ^ ^ + 2/(^Hlog2/ + l) 

ai>a2>--->am+i-ai+a2-\ ham+i=A'' 

= E^^"^' E ^K,o.-,o^. / ^ J — +y(^)(iogy+i). 

i=\ a2>a3>->am+i 

Let cii\y\ denote the "coefficient" of ?/''*\ which is a function of y and its formal derivatives 
(up to the z-th order derivative). Thus we have 

N 

{y\ogy)^^'> = J2lily]y^^^ = HighjvM + LowjvM, 
1=1 

where High^vM = Ef(iv+i)/2i QiivW'^ and Low^vM = EIS"'^^^^^ QiivW'^- 
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In the following, let P(ai, a2, ■ ■ ■ , am) denote the number of distinct sequences obtained 
by permuting the coordinates of the sequence (oi, a2, ■ ■ ■ , a^)- Namely if 

'3-1 0-2 ■ ■ ■ C^mi ^ C^mi+l ' ' ' '^r7ii+r7i2 ^ ' ' ' ^ <2mi+m2H h'Tij-i+l ' ' ' C!/nii+m2H hmj '^m; 

(2.7) 

then 

P[ai, a2, ■ ■ ■ , On 



mi\m2\ ■ ■ ■rrijl 



Lemma 2.3. 



ai>«2>--->am>l:ai+a2H |-am=?i+l 

Proof. One checks that C[i] = 1 and C[a-i^^a2,--- ,a^] satisfies the following recursion relationship: 
For ai > a2 > ■ ■ ■ > am. > 2, 

C[ai,a2,-,am] = ^ -^(ai, 02, ■ ■ ■ , ^2, " ' ' , &m)C'[fei,fe2,--- ,6m] ) (2-8) 

where the summation is over all bi > b2 > ■ ■ ■ > bm > 1, and all 6j is equal to except for 
one of them, say bk = — 1, and -D(ai, a2, ■ ■ ■ , ctm; &2, ■ ' ' ? ^m) is defined to the number 
of bk occurring in the sequence of 61, 62, ■ ' ' > ^m- For ai > 02 > ■ ■ ■ > = 1, 

C[ai,a2,- , a™] = X] -0(01,02, ■■ ■ , O^; &1 , ^2 , " " " , &m)C'[b,,fe2,- ,fem] ^ ("^ ~ ,a2,- ,a™-i] ; (2-9) 

again here the summation is over all 61 > 62 > ■ ■ ■ > &m > 1, and all 6j is equal to a, 
except for one of them, say bk = — 1, and D(ai, 02, ■ ■ ■ , a^; &2, ■ ■ ■ , &m) is defined to 
the number of bk occurring in the sequence of 61, 62, ■ " " ? ^m- 
One checks that 

Nm+i 1 _ _ ^(ai + a2H ha™)! 



.l)-+i_P(ai,a2, 



m ai\a2\ ■ ■ ■ am'- 



satisfies the initial value and recursion ()2.8|) and ()2.9|) . Since the initial value and recursion 
uniquely determine the sequence, the theorem then follows. □ 

Lemma 2.4. For i = \{N + l)/2], ■ ■ ■ , A^, qi[y] is proportional to (logy + 1)*^^"*^ More 
specifically, we have 

g,M=C,^(logy + l)(^-^), 

where Ci^N is an integer. 

Proof. We first prove that for = 2A; + 1, the coefficient of ?/(^+^) is proportional to z^^~^\ 
where z = (logy + 1)' = y'/y. According to Leibnitz formula, we have 

2k 

1=0 

6 



2k— 1 
1=0 

It suffices to prove that the coefficient of ?/('^+^) of 

is C2jl^ z^''^^'^ ■ Applying Lemma 12.31 and collecting terms, we have the coefficient of ?/('^+^) 
equal to 

+ ■■■ + c!tci,^,,,-,iy%^'-'^ /y' + ■■■ + ■ ■■y^'^)/y'- 

Consider the term (y^"^^?/*-"^) ■ ■ ■ y^"'"))/?/™- (here ai + a2 + ■ ■ ■ + am = k) and compute its 
coefficient in the expression above. Assuming that cii > ^2 > ■ ■ ■ > 0,.^ satisfy ()2.7|) . we have 
the coefficient of ?/('^+^): 

^2k '-'[fc+l,a2,--- ,am] '-^2fc '-'[fc+l.ai,--- ,0™-^ ,0^1+2, ■■■ ,cim] + ' ' ' 

„2fc + l— amj+m2H hrrij^i+l „ 

1 f (2A;)! (2A; + l-ai)! m! 



m V(2A;+ 1 - 1)! + l)!a2 
(2/c)! {2k + l-am,+i 



•Om! (mi - l)!m2! ■ ■■rrijl 
ml 



+ •■■ + 



{2k + l- ami+i)!(ami+i - 1)! cti! • • -am^lik + l)!am,+2! ■■■ami mi\{m2 - 1)! ■ ■■rrijl 

(2^)! (2A; + l-a^,+...+^^._,+i)! m! 

(2/c + 1 - a„,+...+m^._^+i)!(am,+...+m^_^+i - 1)! ■ ■ ■ amj+...+„._J(A; + iy.ami+-+m,^+2^- ■ ■ - Om! mi!m2! ■ ■ ■ {rrij - 1)! 

_i 1 (2A;)! ml miai + m2ami+i -\ h mjami+-+m,^-,+i 



-ly 



m {k + 1)1 mil ■ ■ -mjl aila2l ■ ■ ■ ami 

■1) 



m+i 1 (2A;)! (ai + aaH ham)! ml 



m {k + iy.{k — ly. aila2l ■ ■ ■ ami mi!m2!---mj! 

It then follows that the coefficient of ?/('^+^) is equal to €2^^ z^'^^^K 

One can do similar computations to prove that for N = 2k, 2k + 1, this lemma holds 
for other derivatives. An alternative approach is to use induction. Using the fact that the 
coefficient of is proportional to (established above), one can prove by induction 

that for the 2A;-th order derivative of ylogy, the coefficient of y^'"'^ is proportional to (logy + 
X)(2A:-0 fQj, I -^i^i^ _)_ X < / < 2k; and for 2k + 1-th order derivative of ylogy, the coefficient 
of y^'-^ is proportional to (logy + 1) (2^+1-0 for / with k + 2 < I < 2k + 1. □ 
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Lemma 2.5. 



\iN-l)/2] 



\{N-l)/2] 



Low AT [ax] = ri[a]x 



(0 



i=Q i=0 

where ri[a] is a function of a and its derivatives (up to order \{N — l)/2'\), and Si[x] is a 
function of x and its derivatives (up to order [(A^ — l)/2]j. Also, 



so[x] = Low n[x]- 



Proof. By Leibnitz formula, we have 



N 



{{ax) log(ax))(^) = ^C^(aa;)«(log(ax)) 



(N-i) 



i=0 



N 



i=0 j=0 

Thus there exist a function of a and its derivatives ti [a] , and a function of x and its derivatives 
Wi[x] such that 



N 



N 



{{ax) log(ax))*^^^ = 'Yti[a]x^''^ + y^^Wi[x]a 



i=0 



i=0 



with wo[x] = {xlogx)^^\ 
By Lemma (2 .4^ we have 



TV 



N 



High TV [oa;] = gj[ax](ax)^*-' = Ci_Ar(loga + logx + 1) 

i=\{N+l)/2] i=\(N+l)/2] 



Thus we conclude that there exist a function of a and its derivatives Ui[a], and a function of 
X and its derivatives Vi[x] such that 



N 



N 



High7v[aa;]= ^ w^ajx^'-* + ^ Vi[x]a^^\ 

i=\(N+l)/2] i=\{N+l)/2] 

with vo[x] = High n[x]- Since 

Low AT [ax] = {{ax) log(ax))*-^'' — High at [ax], 

existence of rja] and Sj[x] then follows, and they depend on the derivatives only up to 
[(A^ - l)/2], and So[x] = Lowjv[x]. □ 

Theorem 2.6. If at e = e, for every a & A, is a rank one matrix, and every column of 
is either a positive or a zero column, then 



^ai+a2H — ^°^^H{Z) 



nai+02H ^OimlJ,, 

O /lp(a^+Q,2 + ...+Q,^+i)/2] 
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Thus 



Proof. For simplicity we assume that A is only parameterized by only one variable e, and 
we drop s when the implication is clear from the context. Recall that 

With slight abuse of notation (by replacing the formal derivative with the derivative with 
respect to e, we can define High Ar[p(z°„)] = High Ar[p^(z°„)]. Similarly for Low n[p{z'^^)], 
etc.), 

(p(2°Jlogp(£j)(^)=High^[p(z°J]+Low^b(£j] 
(pizZl,) logp(^:;^))(^) = High^[p(zZ^)] + Low^pizZl,)] 
Note that by Lemma f2.4| we have 

N 

High^[p(£j] = C,M^ogp{zo\zZl) + logp(;.:;^) + 1)(^-V(£J«, 

i=r(A^+i)/2i 

and 

N 

High;v[p(^:;i)] = c.M^ogpizz'j + lY'^-^izzlf^ 

i=\{N+l)/2] 

J]High^[p(^° J] - J]High;v[p(^I^)] 

N 

= E E C,^i^ogp{zo\zZl) + logpizZl) - logp(^::^))(^-)p(z°J« 

z\ i=\{N+l)/2] 

N 

= E E C,^i^ogp{zo\zZl)Y'^-%{z\f^ 

z°_„ i=\{N+l)/2] 
N 

i=\{N+l)/2] 

So the higher derivative part stabilizes at [(A^ + l)/2], namely for any n > \{N + 
E^o^High^[p(2;°„)] - ^^^1 High7v[p(2;I;^)] is equal to E^o^^^^^^/^i Highiv[p(2;° p(^+i)/2])] - 
^2-1^ ^ High Ar[p(-2Zp(jv+i)/2] )]• LemmaEni we have 

r(jv-i)/2i r{^-i)/2i 
Low;v[p(^°J]= E ^■M^oKl.M^z'J'^ + E ^^[p(^-n))M^ok:i)^^\ 

i=0 i=0 

with So[p{zZn))] = Low Nlpizzl)]. Thus, 
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l{N-l)/2] 

= ^*[p(^ok_|-(Ar+l)/2])]p(2;-[(Ar+l)/2])^*^- 

Consequently the lower derivative part stabilizes at [(iV+ l)/2] as well, namely for any n > 
\{N+l)/2], Lowjv[p(z°„)]-^^-i Low7v[p(^In)]isequalto^^o Low ^[^(2;° .(^^^^.,2. )]- 

-1 Low Af [pf-z^r^Ar , The theorem then follows. 



□ 



Remark 2.7. For an irreducible stationary Markov chain Y with probability transition 
matrix A, let denote its reverse Markov chain. It is well known that the probability 
transition matrix of is diag {tci'^, TTg'^ ■ ■ ■ , 7r^^)A*diag (tti, 712, - ■ ■ , tt^), where A* denotes 
the transpose of A and (tti, 7^2, ■ ■ ■ , t^b) is the stationary vector of Y. Therefore if A* is a 
Black Hole case, the derivatives of H{Z~^) (here, is the reverse hidden Markov chain 
defined by Z"^ = ^{Y'^)) also stabilize. It then follows from H{Z) = H{Z'^) that the 
derivatives of H{Z) also stabilize. 



3 Binary Markov Chains Corrupted by Binary Sym- 
metric Noise 

In this section, we further study hidden Markov chains obtained by binary Markov chains 
corrupted by binary symmetric noise with crossover probability e (described in Example 4.1 
of 12]). We take a concrete approach to study H{Z), and we will "compute" H'{Z) in terms 
of Blackwell's measure. 

Here the Markov chain is defined by a 2 x 2 stochastic matrix H = [vTjj] (the reader should 
not confuse H with the 4x4 matrix A: 

7roo(l-£) rcooe 7roi(l-£:) tvoiE 

7roo(l-£:) TTooe 7roi(l-£:) ttqiE 

7rio(l-£:) TTioe 7rii(l-£:) Hue ' 

_7rio(l-£:) TTioe 7rii(l-£:) nue _ 

which defines the hidden Markov chain via a deterministic function). 

When det(H) = 0, the rows of H are identical, and so Y is an i.i.d. random sequence 
with distribution (ttoo, vtoi). Thus, Z is an i.i.d. random sequence with distribution (tt, 1 — tt) 
where tt = 7roo(l — £) + ttqi^. So, 

H{Z) = —TT log TT — (1 — tt) log(l — tt). 

From now through the end of Section 13.21 we assume: 
• det(H) > - and - 



10 



all TTij > - and 



e>0. 



We remark that the condition det(n) > is purely for convenience. Results in this sec- 
tion will hold with the condition det(n) < through similar arguments, unless specified 
otherwise. 

The integral formula expresses H{Z) in terms of the measure Q on the 4-dimensional 
simplex; namely Q is the distribution of p{{yo,eo)\z'^^). However, in the case under con- 
sideration, H{Z) can be expressed as an integral on the real line |HI, which we review as 
follows. 

From the chain rule of probability theory, 

Pi4^ yd = Pi4~^^ Vi-i = 0, Vi) + pizl~\ Zi, yi-i = l, yi) 
= p{zi,yi\z\~^,yi-i = 0)p{zl~^,yi^i = 0) + p{zi,yi\z\'^ ,y^^i = l)p{z{'\yi^i = 1), 

and 

p{zi,yi\z\~^,yi_i = 0) = p{z\\zl~-\y,,y,_i = 0)p{yi\z'f^,yi^i = 0) 

= Pizi\yi)p{yi\yi-i = 0) = PEiei)piyi\yi^i = 0). 

Let tti = p{z\, yi = 0) and bi = p{z{, yi = 1). The pair (a^, bi) satisfies the following dynamical 
system: 

di = PE{Zi)7!-0Qai_i + PE{Zi)7Tiobi-i 

bi = PE{zi)7Toiai^i + pE{zi)niibi-i. 
Let Xi = tti/bi, we have a dynamical system with just one variable: 

where 

Pe{z) VToiX + TTii 

starting with 

XO = TTlo/vToi. 

We are interested in the invariant distribution of Xn, which is closely related to Blackwell's 
distribution of p{{yo, eo)|-2°oo)- Now 

p{yi = 0|4"^) = piVt = 0, yi-i = 0\z\-^) + p{yi = 0, yi^i = 
= 7roop(?/i-i = 0\z\'^) + 7riop(i/i-i = l\zl'^) 

O'i-l , bi_i 

= TToo — hVTio- 



Xi-l , 1 
7i"oo:r~^ \- TTlo- 



1 + 1 + Xi. 
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Similarly we have 



p{yi = l\z\ ^) = p{yi = = 0\z{ ^) +p{yi = l,yi_i = l\z\ 

Xi-l 1 
= '^01^. hTTii: 



1 + Xi-i 1 + Xi-i 



Further computation leads to 

p{zi^O\zl-') - p(y^ = 0,ei = 0\zl''')+p{y, = l,e, = l\zl-') 

= p(e, = 0)p{yi = Q\z\-')+p{ei - = ll^r^) 

= ((1 - £)7roo + STToi)— ^ h ((1 - £)7rio + e-Kii)— — 

-L ~r ^i—l J- \ OCi 



where 

/ \ _ ((1 ^ ^'")'i"0() + '^T^Ol)''' + ((1 — '^)'^W + - Ti"!!) 

Similarly we have 

p{zi = l\z{-') = p{y, = 0,ei = l\z{-^)+p{y, = l,ei = 0\z{-^) 

= p(e, = l)p{y, = 0\zi-') +p{ei = 0)p{y, = llzl') 

X _i 1 
= ((sTToo + (1 - £)7roi)-— h (sTTio + (1 - s)t:u)-. 



1 + 1 + Xi- 

= ri(xi_i), 

where 

/ V (sTToo + (1 - £)7roi)x + (STTIO + (1 - £)7rii) 

nix = ^ — -. 

^ ^ x + 1 

Now we write 

p{xi e £^|a;i_i) = ^ ^(zi = a|xi_i). 



Note that 



{a\fa{xi-i)eE} 

p{zi = 0\xi^i) = p{zi = 0\z\~^) = ro{xi^i), 
p{zi = l\xi_i) = p{zi = l\zl~^) = ri{xi_i). 



The analysis above leads to 



p{xieE)^ / ro{xi^i)dp{xi^i) + / ri{xi^i)dp{xi^i). 
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Abusing notation, we let Q denote the limiting distribution of Xi (the limiting distribution 
exists due to the martingale convergence theorem) and obtain: 

Q{E) = [ ro{x)dQ{x) + [ n{x)dQ{x). (3.12) 

We may now compute the entropy rate of Zi in terms of Q. Note that 

E{logp{zi\zl~')) = E{p{z, = Q\z\~^)\ogp{z, = Q\z\-^))+p{z, = l\z\-^)\ogp{z, = l\z\-^)) 
= E{ro{xi-i) logro(a;i_i) + ri(xj_i) logri(xi_i)). 

Thus ()1.H) becomes 

H{Z) = - J {ro{x) logro(a;) + ri{x) \ogri{x))dQ{x). (3.13) 

3.1 Properties of Q 

Since det(n) > 0, /o and fi are increasing continuous functions bounded from above, and 
/o(0) and /i(0) are positive; therefore they each have a unique positive fixed point, po and 
Pi. Since /i is dominated by /q, we conclude pi < Pq. Let 

• I denote the interval [pi,po] ^ and - 

• L = U^^^ L„ where 

Ln = {fh % ■ ° fin{Pj)\h,i2,-- ■ e {0, 1}, j = 0,1}. 

Let denote /i„ o o ■ ■ ■ o /j^(/), and Pi^i^-i^ denote p{zi = i^,z2 = 12, - ■ ■ , = 

in)- The support of a probability measure Q, denoted supp{Q), is defined as the smallest 
closed subset with measure one. 

Theorem 3.1. supp{Q) = L. 

Proof. First, by straightforward computation, one can check that /o(j9o) and f[{pi) are both 
less than 1. Thus, Pq and pi are attracting fixed points. Since Pi is the unique positive 
fixed point of fi, it follows that the entire positive half of the real line is in the domain 
of attraction of each /j, i.e. for any p > 0, fi^\p) approaches Pi (here the superscript 
denotes the composition of n copies of the function). 

We claim that both pq and pi are in supp{Q). If po is not in the support, then there 
is a neighborhood Jpg containing po with Q-measure 0. For any point p > 0, for some n, 
fo^\p) ^ ^po- Thus, by Equation 13.121 there is a neighborhood of p with Q-measure 0. It 
follows that Q{[0, 00)) = 0. On the other hand, Q is the limiting distribution of Xj > and 
so Q{[0, 00)) = 1. This contradiction shows that po G supp{Q). Similarly, p\ G supp{Q). 

By Equation I3.12t we deduce 

fi{supp{Q)) C supp{Q). 
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It follows that L C supp{Q). Thus L C supp{Q). 

Since /j((0, oo)) is contained in a compact set, we may assume fi is a contraction 
mapping (otherwise compose /o or fi enough many times to make the composite map- 
ping a contraction as we argued in j2]). In this case the set of accumulation points of 
{fin ° fin~i • • ■ o /ii (p) Ni, ^2, ■ ■ ■ ,in& {0, l},p> 0} does not depend on p. Since any point in 
supp{Q) has to be an accumulation point of {/i„o/j^_^ ■■ ■o/j^(7rio/7roi)|ii,i2,;- ■ , e {0,1}}, 
it has to be an accumulation point of L as well, which implies supp{Q) C L. □ 

It is easy to see that: 
Lemma 3.2. The following statements are equivalent. 

1. fo{I)UMl)^I. 

2. /o(/)n/i(/) = 0. 

3. /i(po)</o(pi). 

Theorem 3.3. supp{Q) is either a Cantor set or a closed interval. Specifically: 

1. supp{Q) is a Cantor set z//o(/) U /i(/) ^ /• 

2. supp{Q) = I if equivalently fo{I) U /i(/) = /. 

Proof Suppose that /o(/) U /i(/) ^ /. If {i^,i2, ■■■ ,i^) ^ (ji, j2, ■ ■ ■ ,jn), then 

Define: 

Alternatively we can construct /<„> as follows: let I'^ = {fi{po),fo{pi)), then 

I<n+i> = I<n>\ [j fi„ o fin-i fh i^'^)- 

Let /<oo> = n^i-^<™>- It follows from the way it is constructed that /qo is a Cantor set 
(think of as a "deleted" interval), and L = /<oo>- Thus by Theorem 13.11 supp(O) = L is 
a Cantor set. 

Suppose fo{I) U /i(/) = /. In this case, for any point p E I, and for all n, there exists 
ii,i2, - ■ ■ ,in such that 

From the fact that /o and /i are both contraction mappings (again, otherwise compose /o 
or /i enough many times to make the composite mapping a contraction as we argued in [21), 
we deduce that the length of Iiii2--i„ is exponentially decreasing with respect to n. It follows 
that L is dense in /, and therefore supp{Q) = L = I. □ 
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Theorem 3.4. Q is a continuous measure, namely for any point p G supp{Q), and for any 
T] > 0, there exists an interval Ip containing p with Q{Ip) < f] (or equivalently Q has no point 
mass). 

Proof. Assume that there exists p E I such that for any interval containing p, Q{Ip) > rjQ, 
where rjo is a positive constant. Let ^ = max{ro(a;), ri(x) : x G /}. One checks that 
< ^ < 1. By dnH, we have 

-^Q{Ip)<Q{fo\lp)) + Q{fr\lp)). 



Iterating, we obtain 



For fixed n, if we choose Ip small enough, then 

for (ii, ^2, ■ ■ ■ ,in) (ji)i2, ■ ■ ■ ,jn)- It follows in this case that 



^0- 



Therefore for large n, we deduce 

Qil) > 1, 

which contradicts the fact that Q is a probability measure. □ 

By virtue of Lemma 13.21 it makes sense to refer to case 1 in Theorem 13.31 as the non- 
overlapping case. We now focus on this case. Note that this is the case whenever e is 
sufficiently small; also, it turns out that for some values of vTjj's, the non-overlapping case 
holds for all e. 

Starting with xq = vtio/ttoi, and iterating according to Xn = /^^(e, each word 
z = zi, Z2, ■ ■ ■ , Zn determines a point Xn = Xn{z) with probability p{zi, Z2, - ■ ■ , Zn). In the 
non-overlapping case, the map z Xn is one-to-one. We order the distinct points {xn} from 
left to right as 

Xn,l^ Xn,2} ' ' ' ; 3:„^2" 

with the associated probabilities 

This defines a sequence of distribution which converge weakly to Q. In particular, by 
the continuity of Q, Qn{J) — * Q{J) for any interval J. 

Theorem 3.5. In the non-overlapping case, 
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□ 



Proof. We have 

Qniliiii-iJ = P{Zl = il, Z2 = i2,--- , Zn = in) ■ 

Furthermore 

Qn+l{Ini2--in) = Qn+l{lQhi2-in) + Qn+l{Ili 
= p{Zo = 0,Zi = ii,Z2 = i2r ■ ■ , = in) + P{Z0 = l,Zi=ii,Z2=i2,--- ,Zn = in) 

= p{Zi = il,Z2 = i2, - ■ ■ ,Zn = in) 

Iterating one shows that for m > n, 
By the continuity of Q (Theorem I3.4|) 

Q{-^iii2---in) ~ Piii2---in- 

From this, as in |H1 IH] we can derive bounds for the entropy rate. Let 

r(x) = — (ro(x) logro(x) + ri(x) logri(x)). 
Using ()3.13p and Theorem 13 .51 we obtain: 
Theorem 3.6. In the non-overlapping case, 

''^ili2-inPhi2-in ^ H{Z) < ^jij2 --«n^'«l«2- -in) 

where r"^- ■ = min^-pr- - , r(x) and r^- ■ = max^p/. , - r(x]. 

Proof. This follows immediately from the formula for the entropy rate H{Z) ( I3.13|) . □ 

3.2 Computation of the first derivative in non-overlapping case 

To emphasize the dependence on e, we write Pn,i{£) = Pn,i, a;„^j(e) = Pq^s) = po, 
Pi{s) = pi, and Qni^) = Qn- Let x) denote the cumulative distribution function of 
Qnis). Let H^{Z) be the finite approximation to H'^{Z). It can be easily checked that 

Hl{Z) = jr{e,x)dQn{e) 

and we can rewrite ()3.13p as 

H'{Z) = jr{e,x)dQ{e). 

In Theorem 13. 7| we express the derivative of the entropy rate, with respect to e, as the sum of 
four terms which have meaningful interpretations. Essentially we are differentiating H^{Z) 
with respect to e under the integral sign, but care must be taken since Q{e) is generally 
singular and varies with e. 
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Rewriting this using the Riemann-Stieltjes integral and applying integration by parts, we 
obtain 



= F„(£, x)r{e, x) - Fn{e, x)g{e, x)dx, 

where g{e,x) = 

From now on ' denotes the derivative with respect to e. Now, 

Hl{Z)'^r{eMe))'-Dn{e), 

where 

/ N 1- JiFn{e + h,x)g{e + h,x)dx - JjFn{e,x)g{e,x)dx 
Dn{e) = hm . 

h-^O h 

We can decompose Dn{s) into two terms: 

D^{s)^Dl(s) + Dl(s), 

where 



and 



N r f Fn{e + h,x) - Fn{e,x) 
Dn{e) = hm g{e,x)dx, 

Dl{e) = J^Fr,{s,x)g'{s,x)dx. 



In order to compute Dl^{e), we partition / into two pieces: 1) small intervals {xn,i{s),Xn,i{s + 
h)) and 2) the complement of the union of these neighborhoods, to yield: 

Dni^) = 1™ / g{e,x)dx = 

/i— >o J J n 

-^Pn,i{^)^n,i{^)'9{£,Xn,i){£) + / F^{e , x) g{e , x)dx . 

Combining the foregoing expressions, we arrive at an expression for H^{Z)': 



^ F^{e, x)g{e, x)dx — j Fn{€, x)g'{e, x)dx. 



Write H^{Z) — H{Z), Q{e) — Q and let F{e,x) be the cumulative distribution function 

of g(£). 

We then show that H^{Z) converges uniformly to H^[Z) and H^{Z)' converges uniformly 
to some function; it follows that this function is H^{Zy. This requires showing that the 
integrands in the second and third terms of the previous expression converge to well-defined 
functions. 

We think of the Xn,i{£) as locations of point masses. So, we can think of Xn,i{sy as an 
instantaneous location change. 
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1. 2nd term, Instantaneous Location Change (See Appendix EI): For x G supp{Q{e)) 
and any sequence of points x„2^j2(5), • • • approaching x, Ki{e, x) = linij^oo Xnj,iji^) 
is a well-defined continuous function. 

2. 3rd term, Instantaneous Probabihty Change (See Appendix iPjl: Recall that 
supp{Q{e)) is a Cantor set defined by a collection of "deleted" intervals: namely, I'^ = 
(/o(pi)5 fi{po)), and all intervals of the form /j^o/j^o- ■ ■o/j^(/'^) (called deleted intervals 
on level n). For x belonging to a deleted interval on level n, define K2{e, x) = F^{e, x). 
Since the union of deleted intervals is dense in /, we can extend K2{s, x) to a function 
on all X E I, and we show that K2{e,x) is a well-defined continuous function. 

Using the boundedness of the instantaneous location change and probability change (es- 
tablished in Appendix El and Appendix EI) and the Arzela-Ascoli Theorem (note that Ap- 
pendix O and Appendix imply pointwise convergence of H^{Z)' and Appendix O and 
Appendix iBl implv equicontinuity of H^{Zy), we obtain uniform convergence of H^{Zy to 
H'^{Z)', which gives the result: 

Theorem 3.7. In the non-overlapping case, 

H'{Zy = r{e,po{e)y+ j K,{e,x)g{e,x)dF{e,x) 

Jsupp{Q(e)) 

K2{€, x)g{e, x)dx — j F(e, x)g'{e, x)dx. 

Note that the second term in this expression is a weighted mean of the instantaneous 
location change and the third term in this expression is a weighted mean of the instantaneous 
probability change. 

Remark 3.8. Using the same technique, we can give a similar formula for the derivative of 
H'^{Z) with respect to TTj/s when e > 0. We can also give such formulae for higher derivatives 
in a similar way. 

Remark 3.9. The techniques in this section can be applied to give an expression for the 
derivative of the entropy rate in the special overlapping case where /o(pi) = /i(po)- 

3.3 Derivatives in other cases 

1. If any two of the tt^'s are equal to 0, then 

H%Z) = -eloge - {1 -e) \og{l ~ e) 

H^{Z) is not differentiable with respect to £ at e = 0. 

2. Of more interest, it was shown in [9j that H{Z) is not differentiable with respect to 
e a.t e = when exactly one of the vTij's is equal to 0. We briefly indicate how this is 
related to (jH.lHj) . Consider the case: ttoo = 0, ttqi = 1, < vrio < 1. Then for e > 0, 

H{Z) = - rQ{x) log r o{x) dQ - r q{x) log ro{x)dQ 

Jlo Jh 
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ri(x)\ogri{x)dQ — J ri(x)\ogri{x)dQ. 



When e —>■ 0, the lengths of Iq and Ji shrink to zero with /i approaching and /q approaching 
oo. So, of the four terms above, as £ — > 0, the dominating term will be 

rQ{x)\ogro{x)dQ ~ eloge, 

lo 

and all the other three terms are bounded by 0{e) (see ()3.1Up and ()3.11|) ). This indicates 
that H{Z) is not differentiable with respect to e at e = 0. 

3. Consider the case that e = and all the tTj^'s are positive. As discussed in Example 
4.1 of the entropy rate is analytic as a function of e and TTjj's. 

In j3] (and more generally in [13j, iMj), an explicit formula was given for H'{Z) at e = 
in this case. We briefly indicate how this is related to our results in Section 13.21 

Instead of considering the dynamics of Xn on the real hne, we consider those of (a^, hn) 
on the 1 dimensional simplex 

W = {(wi, W2) : Wi + W2 = l,Wi > 0}. 

Let Q denote the limiting distribution of (a„, 6„) on W, the entropy H{Z) can be computed 
as follows 

H{Z)= / -{rQ{w)\ogrQ{w) + ri{w)\ogri{w))dQ, 



w 



where 



ro{w) = ((1 - e)noo + enoi)wi + ((1 - e)nio + e7rii)w2, 

ri(w) = ((eTToo + (1 - e)7roi)uii + (^ttio + (1 - e)7ru)w2. 

In order to calculate the derivative, we split the region of integration into two disjoint parts 
W = W°UW^ with 

= {t(0, 1) + (1 - t)(l/2, 1/2) : < t < 1}, 

= {til/2, 1/2) + (1 - t)(l, 0) : < t < 1}. 
Let r{w) = — (ro(w) logro(ty) + ri{w) logri(w)), and H\Z) = r{w)dQ, then 

H{Z) = H\Z) + H\Z). 

For W'^, we represent every point (^1,^2) using the coordinate Wi/w2- For W^, we 
represent every point (ti'i,ti'2) using the coordinate W2/W1. Computation shows that H^{Z) 
uniformly converge to H^{Z) on [0,1/2]. Note that expressions in Theorem 13.71 are not 
computable for e > 0, however we can apply similar uniform convergence ideas in each of 
these regions to recover the formula given in |3] for e = 0. 

4. (Low SNR regime, e = 1/2) In Corollary 6 of 8J, it was shown that in the 
symmetric case (i.e., ttqi = ttiq), the entropy rate approaches zero at rate (1/2 — as e 
approaches 1/2. It can be shown that the entropy rates at e and 1 — e are the same, and so 
all odd order derivatives vanish at e = 1/2. It follows that this result of jH] is equivalent to 
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the statement that in the symmetric case H"{Z)\s=i/2 = 0. We generahze this result to the 
non-symmetric follows: 

J/"(Z)U/. = -4(=ll-l2iy. 

V^lO + TToi/ 

For more details, see Appendix lEl 

Appendices 

A Proof of Boundedness of Instantaneous Location Change 

Claim: For any fix < < 1/2, x^^j{e) < Ci{k,ri), 1] < e < 1/2, Ci is a positive constant 
only depending on k,r]. 

Proof. We only prove the case when k = 1. Consider the iteration, 
Take the derivative with respect to e, we obtain 



Note that -^^||^(£,x„) is uniformly bounded by a constant and ^^^^{e,Xn) is bounded 
by p with < p < 1, we conclude x'^ is uniformly bounded too. □ 

B Proof of Boundedness of Instantaneous Probability 
Change 

Claim: For x ^ {xn,i} and < e < 1/2, F^^\e,x) < C2{k), where C2 is a positive constant 
only depending on k. 

Proof. We only prove the case when k = 1. For x with Xn,2i < x < Xn,2i+i, we have 
Fn{e,x) = Fn-i{e,x), and consequently ^^"^^^'^^ = 9F„-^ie,x) ^ ^ with Xn,2i-i < x < Xn,2i, 
^^"q^'^^ — ^^"-^^"^'^^ is bounded by Cp", here C is a positive constant and < pi < 1 (see 
proof that K2 is well-defined in Appendix |D}. Therefore we conclude the instantaneous 
probability change is uniformly bounded. □ 

C Proof that Ki is Well-defined 

Proof. We need to prove that if two points Xni,,i^. and Xni,ii are close, then a;^^ and 

are also close. Note that for non-overlapping case, if Xn^^i^, and Xni,ii are very close, their 

corresponding symbolic sequences must share a long common tail. We shall prove that the 
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asymptotical dynamics of Xn does not depend on the starting point as long as they have the 
same common long tail. Without loss of generality, wc assume that z, z have common tail 
Zi,Z2, - ■ ■ , Zn- In this case, the two dynamical systems start with different value Xq, Xq along 
the same path. Now the two iterations produce 



Take the difference, we have 



n+l ^ \ / 

— y^iXn) {e,Xn)-\ -^^{s^Xnjx^ [s, Xn)x^-\ -^—ye,Xn)x^ ~~dx~ 

Since 

• when n ^ OG, Xn and Xn are getting close uniformly with respect to £ - and - 

• ^(e, •) and ^(e, •) {i = 0, 1) are Lipschitz - and - 

• •) {i — 0, 1) are p-contraction mappings, 

we conclude that x'^ and are very close uniformly with respect to e. The well-definedness 
of Ki then follows. □ 



D Proof that is Well-defined 

Proof. Every deleted interval corresponds to a finite sequence of binary digits and K2 is well 
defined on these intervals. We order the deleted intervals on level n from left to right 

jd jd jd 
^n,li ^n,2i ' ' ' 1 -'n,2"-i- 

Wc need to prove if two deleted intervals /^^ ■, if^- arc close, then J^^ ■) (which is defined 

as Fm{s.iX) with x G /^^J and Fm{£,I^^i) are close. Assume m < n, then the points Xn,kS 
in between ^ and I^j must have a long common tail. Suppose that the common tail is the 
path zi, Z2, - ■ ■ ,Zn, let qi denote the sum of the probabilities associated with these points. 
Note that as long as the sequences have long common tail, the corresponding values of K2 
are getting closer and closer. For simplicity we only track one path for the time being. Then 
we have 

flj+l = PE(^i+l)(7rooai + TTio&i), 

bi+i = PE{zi+i){TToiai + niibi). 
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It follows that 

(ai+i + bi+i) < p{ai + bi), 
here < p < 1 and p is defined as 

p = max{(l - £:)7roo + eiToi, (1 - £:)7rio + ^tth, £7roo + (1 - £)7roi, £7rio + (1 - e)7in}- 

Immediately we have 

{an + hn) < p". 

Take the derivative, we have 

a'n+\ = -(TTooOn + 7rio6n) + (1 - £)(7rooa^ + 7rio6^), 



In this case we obtain, 



Wn+l\ + \b'n+l\<pi\a'n\ + Wn\)+P'". 



which implies that there is a positive Constance C and pi with p < pi < 1 such that 

< + K< Cp-,. 

Then we conclude |ajj + — > as n — > oo. Exactly the same derivation can be applied to 
multiple path, it follows that 

So no matter what level we started from the deleted intervals, as long as they have long 
common tails, the corresponding values of K2 function are close. Therefore K2 is well 
defined. □ 

E Computation of H"{Z) \,^i/2 

Let 
and 

M(Z„_i,Z,) = 
Then we have 

Pn = Pn-lM(Z„_i, Zn). 

Immediately we obtain 

PziZ^) = PiM(Zi, Z2) • • • M(Z„_i, 

We consider the case when the channel is operating on the low SNR region. For conve- 
nience, we let 
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I - e)px{Zn\Zn_i) epx{ Zn\Zn-l) 

1 - £)Px{Zn\Zn-l) £px{Zn\Zn-l) 



and 



Thus when the SNR is very low, namely £ — > |, correspondingly we have 5 — > 0. Since H{Z) 
is an even function at 5 = 0, the odd order derivatives at 5 = are all equal to 0. In the 
sequel, we shall compute the second derivative of H{Z) at 5 = 0. 

In this case, we can rewrite the random matrix Mj = M.{ziZi+i) in the following way: 



Px{Zi+i\Zi) Px{Zi+i\Zi) 
Px{Zi+i\Zi) Px{Zi+l\Zi) 



+ s 



Px{Zi+l\Zi) -Px{zi+i\zi) 
Px{Zi+l\Zi) -px{Zi+l\Zi) 



For the special case when i = 0, we have 



Then 



Mo = 2 \Px{zi),Px{zi+i)\ + 5 \px{zi), -Px{zi)\ . 



Now define the function 



Rn(5) = 5]pz(^^')l0g(pz(^r))- 

^1 

Then according to the definition oi H{Z), 



It can be checked that 



H{Z) = - lim -Rn{S). 

n— >oo 77, 



dS 



dS 



Now 



dpziz"^] 



85 



5=0 



n—1 n—l 



i ) M^Mf ) • • • Ml°) mWm£) . . . Mi°2,l 

^ i=0 
/ \ n—l 11 

" ( 9 ) ^^Px^^i) -Px{Zi))- 



i=l 



Again simple calculations will lead to 



d 'pzjz^) 
85^ 



logpziz^) + 



Pziz"^) \ 86 



1 f8pziz^)y , 8^pz{z^ 



+ 



85^' 



Since 

d'^pzjz^) 
85^ 



6=0 



-) E Mi°^Mf ) • • • Mf\Mf )m1°\ • • • Mfl.M^Mfl, ■ ■ ■ M^l.l 
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n-2 



[px{zi+i),-px{zi+i)] 



Px{Zj+i\Zi+l) -px{Zj+i\Zij,i) 
( 2 ) ^{Px{Zj+l,Zi+i) -px{Zj+l,Zi+i) - px{Zj+i, Zi+i) + px{Zj+i,Zi+i)), 



we have 



<5=0 



n-1 n. 



Let X, y temporarily denote the stationary distribution 



Px(0) 



TTlO 



TToi + VTio TToi + TTiq 

respectively. Then 

= ^i:Q(2.a; + 2(.-z),-.f 

<5=0 j=o 
1 " 

= i^,Y.^'n{{2x - 2y)t + 2ny - nf 

i=0 

n n n 

= {2x - 2yf C;^' + (2^2/ -'^fY.^ + 2(2x - 2y){2ny - n) ^ C^z 

1=0 

Using the following two combinatoric identity 

n 

Y,^Cl = n2-\ 



i=0 



j=0 



i=0 



and 

we derive 



^ ecl, = n{n - l)2"-2 + ^2"-\ 



i=0 



5=0 



)n-2 



((x - - 1)2" + n2"+i) + n22"(2?/ - 1)^ + 2{x - y){2y - l)n^2'^) 



= An{x — yY- 

From the fact that the derivatives of H{Z) with respect to e are uniformly bounded on 
[0, 1/2] (see j^, also implied by Theorem 1.1 of 2J and the computation of H"^ {Z)\i;=q) , we 
draw the conclusion that the second coefficient of H{Z) is equal to 



H"{Z)\ 



e=l/2 



24 



References 



D. Blackwell. The entropy of functions of finite-state markov chains. Trans. First 
Prague Conf. Information Thoery, Statistical Decision Functions, Random Processes, 
pages 13-20, 1957. 

G. Han and B. Marcus, Analyticity of Entropy Rate of Hidden Markov Chains, 
http:/ /front. math. ucdavis.edu/math. PR/0507235, Submitted to IEEE Transactions on 
Information Theory, a preliminary version can be found in Proc. of IEEE International 
Symposium on Information Theory, Adelaide, Australia September 4-September 9 2005, 
pages 2193-2197. 

T. Holliday, A. Goldsmith, and P. Glynn. On entropy and lya- 

punov exponents for finite state channels. 2003. Available at 

http: / /wsl. stanford.edu/Publications/THolliday/Lyapunov.pdf. 

P. Jacquet, G. seroussi, and W. Szpankowski. On the entropy of a hidden markov 
process. In Proceedings of the 2004 IEEE International Symposium on Information 
Theory, page 10, Chicago, U.S.A., 2004. 

J. Kemeny and J. Snell. Finite Markov chains. Princeton, N.J. Van Nostrand, 1960. 

D. Lind and B. Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge 
University Press, 1995. 

B. Marcus and K. Petersen and S. Williams. Transmission rates and factors of markov 
chains. Contemporary Mathematics, 26:279-294, 1984. 

E. Ordentlich and T. Weissman. On the optimality of symbol by symbol filtering and 
denoising. Information Theory, IEEE Transactions, Volume 52, Issue 1, Jan. 2006 
Page(s):19 - 40. 

E. Ordentlich and T. Weissman. New bounds on the entropy rate of hidden Markov 
process. Information Theory Workshop, 2004. IEEE 24-29 Oct. 2004 Page(s):117 - 122 

E. Ordentlich and T. Weissman. Personal communication. 

Y. Peres. Analytic dependence of Lyapunov exponents on transition probabilities, vol- 
ume 1486 of Lecture Notes in Mathematics, Lyapunov 's exponents, Proceedings of a 
Workshop. Springer Verlag, 1990. 

Y. Peres. Domains of analytic continuation for the top Lyapunov exponent. Ann. Inst. 

H. Pomcare Probab. Statist, 28(1):131-148, 1992. 

O. Zuk, I. Kanter and E. Domany. Asymptotics of the entropy rate for a hidden Markov 
process. J. Stat. Phys., 121(3-4): 343-360 (2005) 

O. Zuk, E. Domany, I. Kanter, and M. Aizenman. Taylor series expansions for the 
entropy rate of Hidden Markov Processes. ICC 2006, Istanbul. 



25 



